29 research outputs found

    Maximum parsimony xor haplotyping by sparse dictionary selection

    Get PDF
    Background: Xor-genotype is a cost-effective alternative to the genotype sequence of an individual. Recent methods developed for haplotype inference have aimed at finding the solution based on xor-genotype data. Given the xor-genotypes of a group of unrelated individuals, it is possible to infer the haplotype pairs for each individual with the aid of a small number of regular genotypes. Results: We propose a framework of maximum parsimony inference of haplotypes based on the search of a sparse dictionary, and we present a greedy method that can effectively infer the haplotype pairs given a set of xor-genotypes augmented by a small number of regular genotypes. We test the performance of the proposed approach on synthetic data sets with different number of individuals and SNPs, and compare the performances with the state-of-the-art xor-haplotyping methods PPXH and XOR-HAPLOGEN. Conclusions: Experimental results show good inference qualities for the proposed method under all circumstances, especially on large data sets. Results on a real database, CFTR, also demonstrate significantly better performance. The proposed algorithm is also capable of finding accurate solutions with missing data and/or typing errors

    Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA

    Get PDF
    Background: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. Results: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. Conclusions: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL

    Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites

    Get PDF
    Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBIā€”a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/

    The impact of arterial input function determination variations on prostate dynamic contrast-enhanced magnetic resonance imaging pharmacokinetic modeling: a multicenter data analysis challenge, part II

    Get PDF
    This multicenter study evaluated the effect of variations in arterial input function (AIF) determination on pharmacokinetic (PK) analysis of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) data using the shutter-speed model (SSM). Data acquired from eleven prostate cancer patients were shared among nine centers. Each center used a site-specific method to measure the individual AIF from each data set and submitted the results to the managing center. These AIFs, their reference tissue-adjusted variants, and a literature population-averaged AIF, were used by the managing center to perform SSM PK analysis to estimate Ktrans (volume transfer rate constant), ve (extravascular, extracellular volume fraction), kep (efflux rate constant), and Ļ„i (mean intracellular water lifetime). All other variables, including the definition of the tumor region of interest and precontrast T1 values, were kept the same to evaluate parameter variations caused by variations in only the AIF. Considerable PK parameter variations were observed with within-subject coefficient of variation (wCV) values of 0.58, 0.27, 0.42, and 0.24 for Ktrans, ve, kep, and Ļ„i, respectively, using the unadjusted AIFs. Use of the reference tissue-adjusted AIFs reduced variations in Ktrans and ve (wCV = 0.50 and 0.10, respectively), but had smaller effects on kep and Ļ„i (wCV = 0.39 and 0.22, respectively). kep is less sensitive to AIF variation than Ktrans, suggesting it may be a more robust imaging biomarker of prostate microvasculature. With low sensitivity to AIF uncertainty, the SSM-unique Ļ„i parameter may have advantages over the conventional PK parameters in a longitudinal study

    Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes

    No full text

    Space-time coding for MIMO radar detection and ranging

    No full text
    Space-time coding (STC) has been shown to play a key role in the design of MIMO radars with widely spaced antennas: In particular, rank-one coding amounts to using the multiple transmit antennas as power multiplexers, while full-rank coding maximizes the transmit diversity, compromises between the two being possible through rank-deficient coding. In detecting a target at known distance and Doppler frequency, no uniformly optimum transmit policy exists, and diversity maximization turns out to be the way to go only in a (still unspecified) large signal-to-noise ratio region. The aim of this paper is to shed some light on the optimum transmit policy as the radar is to detect a target at an unknown location: To this end, at first the CramƩr-Rao bounds as a function of the STC matrix are computed, and then waveform design is stated as a constrained optimization problem, where now the constraint concerns also the accuracy in target ranging, encapsulated in the Fisher Information on the range estimate. Results indicate that such accuracy constraints may visibly modify the required transmit policy and lead to rank-deficient STC also in regions where pure detection would require pursuing full transmit diversity

    Quantitative liver MRI combining phase contrast imaging, elastography, and DWI: assessment of reproducibility and postprandial effect at 3.0 T.

    No full text
    To quantify short-term reproducibility (in fasting conditions) and postprandial changes after a meal in portal vein (PV) flow parameters measured with phase contrast (PC) imaging, liver diffusion parameters measured with multiple b value diffusion-weighted imaging (DWI) and liver stiffness (LS) measured with MR elastography (MRE) in healthy volunteers and patients with liver disease at 3.0 T.In this IRB-approved prospective study, 30 subjects (11 healthy volunteers and 19 liver disease patients; 23 males, 7 females; mean age 46.5 y) were enrolled. Imaging included 2D PC imaging, multiple b value DWI and MRE. Subjects were initially scanned twice in fasting state to assess short-term parameter reproducibility, and then scanned 20 min. after a liquid meal. PV flow/velocity, LS, liver true diffusion coefficient (D), pseudodiffusion coefficient (D*), perfusion fraction (PF) and apparent diffusion coefficient (ADC) were measured in fasting and postprandial conditions. Short-term reproducibility was assessed in fasting conditions by measuring coefficients of variation (CV) and Bland-Altman limits of agreement. Differences in MR metrics before and after caloric intake and between healthy volunteers and liver disease patients were assessed.PV flow parameters, D, ADC and LS showed good to excellent short-term reproducibility in fasting state (CV <16%), while PF and D* showed acceptable and poor reproducibility (CV = 20.4% and 51.6%, respectively). PV flow parameters and LS were significantly higher (p<0.04) in postprandial state while liver diffusion parameters showed no significant change (p>0.2). LS was significantly higher in liver disease patients compared to healthy volunteers both in fasting and postprandial conditions (p<0.001). Changes in LS were significantly correlated with changes in PV flow (Spearman rho = 0.48, p = 0.013).Caloric intake had no/minimal/large impact on diffusion/stiffness/portal vein flow, respectively. PC MRI and MRE but not DWI should be performed in controlled fasting state
    corecore